packages.used=c("rvest", "tibble", "qdap",
"sentimentr", "gplots", "dplyr",
"tm", "syuzhet", "factoextra",
"beeswarm", "scales", "RColorBrewer",
"RANN", "tm", "topicmodels")
# check packages that need to be installed.
packages.needed=setdiff(packages.used,
intersect(installed.packages()[,1],
packages.used))
# install additional packages
if(length(packages.needed)>0){
install.packages(packages.needed, dependencies = TRUE)
}
# load packages
library("rvest")
library("tibble")
library("qdap")
library("sentimentr")
library("gplots")
library("dplyr")
library("tm")
library("syuzhet")
library("factoextra")
library("beeswarm")
library("scales")
library("RColorBrewer")
library("RANN")
library("tm")
library("topicmodels")
library(wordcloud)
library(RColorBrewer)
library(tidytext)
library("xlsx")
library("ggplot2")
source("../lib/plotstacked.R")
source("../lib/speechFuncs.R")
This notebook was prepared with the following environmental settings.
print(R.version)
First import all data needed: InauguationDates.txt, InaugurationInfo.xlsx and all speeches text.
inaug.dates <- read.table("../data/InauguationDates.txt",header=TRUE,sep="\t")
inaug.info <- read.xlsx("../data/InaugurationInfo.xlsx",sheetName="Sheet1",header=T,stringsAsFactors = FALSE)
inaug.info$Words <- as.numeric(inaug.info$Words)
filenames = list.files("../data/InauguralSpeeches") #get all file name
dir = paste("../data/InauguralSpeeches/",filenames,sep="")
n = length(dir)
speeches = list()
for (i in 1:n){
filename = paste("../data/InauguralSpeeches/inaug",inaug.info$File[i],"-",inaug.info$Term[i],".txt",sep="")
new.data = paste(readLines(filename, n=-1, skipNul=TRUE),collapse=" ")
speeches = c(speeches,new.data)
}
names(speeches) <- paste(inaug.info$File, inaug.info$Term,sep="-")
Let’s use “?”, “.”, “!”, “|”,“;” as stop point of one sentence and extract all sentences from speeches.
sentence.list=NULL
for(i in 1:58){
sentences=sent_detect(speeches[i],
endmarks = c("?", ".", "!", "|",";"))
if(length(sentences)>0){
emotions=get_nrc_sentiment(sentences)
word.count=word_count(sentences)
emotions=diag(1/(word.count+0.01))%*%as.matrix(emotions)
sentence.list=rbind(sentence.list,
cbind(inaug.info[i,-ncol(inaug.info)],
sentences=as.character(sentences),
word.count,
emotions,
sent.id=1:length(sentences)
)
)
}
}
Delete all non-sentences resulted by erroneous extra end-of-sentence marks.
sentence.list=
sentence.list%>%
filter(!is.na(word.count))
Before we analyze length of single sentences, let’s have an prewview of total number of words in a speech. Draw a plot of length of speech against their time order.
ggplot(inaug.info) +
geom_point(aes(1:58,Words)) +
geom_smooth(aes(1:58,Words))
Notice that as time went by, the total number of words in a speech tend to be less and converges to around 2000. As we know, shorter speech leads to shorter time scale. We can draftly conclude that the speeches tend to be more concise. The speakers knew that the longer their speech is, the less interest people have to hear their words. But too less words can not precisely convey their political thoughts. Therefore, the number of words in a speech have the tendency to be within some certain range.
Now we want to find something about the length of sentences.
sentence.list$TimeOrdered=reorder(sentence.list$File,
1:nrow(sentence.list),
order=T)
sentence.list$FileOrdered=reorder(sentence.list$File,
sentence.list$word.count,
mean,
order=T)
beeswarm(word.count~TimeOrdered,
data=sentence.list,
horizontal = TRUE,
pch=16, col=alpha(brewer.pal(9, "Set1"), 0.6),
cex=0.55, cex.axis=0.44, cex.lab=0.8,
spacing=1.2/nlevels(sentence.list$FileOrdered),
las=2, xlab="Number of words in a sentence.", ylab="",
main="Inaugural Speeches")
As we can see on above plot, y-coordinate follows the time order. GeorgeWashington is the first president while DonaldJTrump is the present president. We found that as time went by, the number of words in a sentence becomes less and less. The presidents tend to use less words in a sentence.
What are these short sentences?
sentence.list%>%
filter(File=="GeorgeWashington",
word.count<=10&word.count>1)%>%
select(sentences)%>%sample_n(2)
sentence.list%>%
filter(File=="ThomasJefferson",
word.count<=5&word.count>1)%>%
select(sentences)%>%sample_n(5)
sentence.list%>%
filter(File=="AbrahamLincoln",
word.count<=5&word.count>1)%>%
select(sentences)%>%sample_n(4)
sentence.list%>%
filter(File=="FranklinDRoosevelt",
word.count<=5&word.count>1)%>%
select(sentences)%>%sample_n(5)
sentence.list%>%
filter(File=="BarackObama",
word.count<=5&word.count>1)%>%
select(sentences)%>%sample_n(5)
sentence.list%>%
filter(File=="DonaldJTrump",
word.count<=5&word.count>1)%>%
select(sentences)%>%sample_n(5)
Let’s select 6 very famous presidents among American history. Each of them represents different time period. George Washington represents the very beginning of American history while DonaldJ Trump stands for the lastest.
From the output above, George Washington used a sentence with less than 10 words only twice while DonaldJ Trump used a sentence with less than 5 words more than 5 times. George Washington and Thomas Jefferson talks more about freedom and the foundation of America. Abraham Lincoln and Franklin D. Roosevelt who were presidents during war period, their worsds fight for peace and encourage people during hard days. As for DonaldJ Trump and Barack Obama, who are presidents during peace period but suffered from economic crisis, their words focuses more on recovery of economy.
As pace of history pushes forward, the American people faced different problems during different period and their presidents focused on the most impressive topic for people’s life.
par(mfrow=c(6,1), mar=c(1,0,2,0), bty="n", xaxt="n", yaxt="n", font.main=1)
f.plotsent.len(In.list=sentence.list, InFile="GeorgeWashington",
President="George Washington")
f.plotsent.len(In.list=sentence.list, InFile="ThomasJefferson",
President="Thomas Jefferson")
f.plotsent.len(In.list=sentence.list, InFile="AbrahamLincoln",
President="Abraham Lincoln")
f.plotsent.len(In.list=sentence.list, InFile="FranklinDRoosevelt",
President="Franklin D. Roosevelt")
f.plotsent.len(In.list=sentence.list, InFile="BarackObama",
President="Barack Obama")
f.plotsent.len(In.list=sentence.list, InFile="DonaldJTrump",
President="Donald Trump")
We can see that the older presidents(George Washington and Thomas Jefferson) tend to use long sentences to express their opinions precisely. While the newer presidents(Barac kObama and Donald Trump) tend to use short sentences to encourage people and convey their feelings to get emotional resonance.
print("George Washington")
[1] "George Washington"
speech.df=tbl_df(sentence.list)%>%
filter(File=="GeorgeWashington", word.count>=4)%>%
select(sentences, anger:trust)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
[1] "Previous to the execution of any official act of the President the Constitution requires an oath of office."
[2] "and since the preservation of the sacred fire of liberty and the destiny of the republican model of government are justly considered, perhaps, as deeply, as finally, staked on the experiment entrusted to the hands of the American people."
[3] "between the genuine maxims of an honest and magnanimous policy and the solid rewards of public prosperity and felicity;"
[4] "Previous to the execution of any official act of the President the Constitution requires an oath of office."
[5] "I dwell on this prospect with every satisfaction which an ardent love for my country can inspire, since there is no truth more thoroughly established than that there exists in the economy and course of nature an indissoluble union between virtue and happiness;"
[6] "From this resolution I have in no instance departed;"
[7] "Having thus imparted to you my sentiments as they have been awakened by the occasion which brings us together, I shall take my present leave;"
[8] "Previous to the execution of any official act of the President the Constitution requires an oath of office."
print("Thomas Jefferson")
[1] "Thomas Jefferson"
speech.df=tbl_df(sentence.list)%>%
filter(File=="ThomasJefferson", word.count>=5)%>%
select(sentences, anger:trust)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
[1] "the supremacy of the civil over the military authority;"
[2] "The experiment has been tried;"
[3] "they saw the latent source from which these outrages proceeded;"
[4] "the supremacy of the civil over the military authority;"
[5] "peace, commerce, and honest friendship with all nations, entangling alliances with none;"
[6] "peace, commerce, and honest friendship with all nations, entangling alliances with none;"
[7] "The experiment has been tried;"
[8] "peace, commerce, and honest friendship with all nations, entangling alliances with none;"
print("Abraham Lincoln")
[1] "Abraham Lincoln"
speech.df=tbl_df(sentence.list)%>%
filter(File=="AbrahamLincoln", word.count>=4)%>%
select(sentences, anger:trust)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
[1] "Whoever rejects it does of necessity fly to anarchy or to despotism."
[2] "Each looked for an easier triumph, and a result less fundamental and astounding."
[3] "'May' Congress prohibit slavery in the Territories?"
[4] "The Government will not assail 'you'."
[5] "Fondly do we hope, fervently do we pray, that this mighty scourge of war may speedily pass away."
[6] "Whoever rejects it does of necessity fly to anarchy or to despotism."
[7] "The Government will not assail 'you'."
[8] "Perpetuity is implied, if not expressed, in the fundamental law of all national governments."
print("Franklin D. Roosevelt")
[1] "Franklin D. Roosevelt"
speech.df=tbl_df(sentence.list)%>%
filter(File=="FranklinDRoosevelt", word.count>=4)%>%
select(sentences, anger:trust)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
[1] "Happiness lies not in the mere possession of money;" "We shall strive for perfection."
[3] "Yet our distress comes from no failure of substance." "These are the lines of attack."
[5] "Have we found our happy valley?" "We are stricken by no plague of locusts."
[7] "power to do good." "Have we found our happy valley?"
print("Barack Obama")
[1] "Barack Obama"
speech.df=tbl_df(sentence.list)%>%
filter(File=="BarackObama", word.count>=4)%>%
select(sentences, anger:trust)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
[1] "The Capital was abandoned."
[2] "This is the journey we continue today."
[3] "Our capacity remains undiminished."
[4] "Our capacity remains undiminished."
[5] "when the wages of honest labor liberate families from the brink of hardship."
[6] "The Capital was abandoned."
[7] "when the wages of honest labor liberate families from the brink of hardship."
[8] "We affirm the promise of our democracy."
print("Donald Trump")
[1] "Donald Trump"
speech.df=tbl_df(sentence.list)%>%
filter(File=="DonaldJTrump", word.count>=5)%>%
select(sentences, anger:trust)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
[1] "There should be no fear."
[2] "America will start winning again, winning like never before."
[3] "America will start winning again, winning like never before."
[4] "There should be no fear."
[5] "buy American and hire American."
[6] "America will start winning again, winning like never before."
[7] "The bible tells us how good and pleasant it is when God's people live together in unity."
[8] "At the center of this movement is a crucial conviction, that a nation exists to serve its citizens."
par(mar=c(4, 6, 2, 1))
emo.means=colMeans(select(sentence.list, anger:trust)>0.01)
col.use=c("red2", "darkgoldenrod1",
"chartreuse3", "blueviolet",
"darkgoldenrod2", "dodgerblue3",
"darkgoldenrod1", "darkgoldenrod1")
barplot(emo.means[order(emo.means)], las=2, col=col.use[order(emo.means)], horiz=T, main="Inaugural Speeches")
We can find that a president must convey positive feelings like trust in order to get people’s trust.
presid.summary=tbl_df(sentence.list)%>%
group_by(File)%>%
summarise(
anger=mean(anger),
anticipation=mean(anticipation),
disgust=mean(disgust),
fear=mean(fear),
joy=mean(joy),
sadness=mean(sadness),
surprise=mean(surprise),
trust=mean(trust)
)
presid.summary=as.data.frame(presid.summary)
rownames(presid.summary)=as.character((presid.summary[,1]))
km.res=kmeans(presid.summary[,-1], iter.max=200,
5)
fviz_cluster(km.res,
stand=F, repel= TRUE,
data = presid.summary[,-1], xlab="", xaxt="n",
show.clust.cent=FALSE)
presid.tmp=tbl_df(sentence.list[inaug.info$Party=="Democratic",])%>%
group_by(File)%>%
summarise(
anger=mean(anger),
anticipation=mean(anticipation),
disgust=mean(disgust),
fear=mean(fear),
joy=mean(joy),
sadness=mean(sadness),
surprise=mean(surprise),
trust=mean(trust)
)
presid.summary=as.data.frame(presid.summary)
rownames(presid.summary)=as.character((presid.summary[,1]))
km.res=kmeans(presid.summary[,-1], iter.max=200,
5)
fviz_cluster(km.res,
stand=F, repel= TRUE,
data = presid.summary[,-1], xlab="Democratic party", xaxt="n",
show.clust.cent=FALSE)
presid.tmp=tbl_df(sentence.list[inaug.info$Party=="Republican",])%>%
group_by(File)%>%
summarise(
anger=mean(anger),
anticipation=mean(anticipation),
disgust=mean(disgust),
fear=mean(fear),
joy=mean(joy),
sadness=mean(sadness),
surprise=mean(surprise),
trust=mean(trust)
)
presid.summary=as.data.frame(presid.summary)
rownames(presid.summary)=as.character((presid.summary[,1]))
km.res=kmeans(presid.summary[,-1], iter.max=200,
5)
fviz_cluster(km.res,
stand=F, repel= TRUE,
data = presid.summary[,-1], xlab="Republican party", xaxt="n",
show.clust.cent=FALSE)
For topic modeling, we prepare a corpus of sentence snipets as follows. For each speech, we start with sentences and prepare a snipet with a given sentence with the flanking sentences.
corpus.list=sentence.list[2:(nrow(sentence.list)-1), ]
sentence.pre=sentence.list$sentences[1:(nrow(sentence.list)-2)]
sentence.post=sentence.list$sentences[3:(nrow(sentence.list)-1)]
corpus.list$snipets=paste(sentence.pre, corpus.list$sentences, sentence.post, sep=" ")
rm.rows=(1:nrow(corpus.list))[corpus.list$sent.id==1]
rm.rows=c(rm.rows, rm.rows-1)
corpus.list=corpus.list[-rm.rows, ]
docs <- Corpus(VectorSource(corpus.list$snipets))
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
If this reasonable expectation be not realized, I frankly confess that one of your leading hopes is doomed to disappointment, and that my efforts in a very important particular must result in a humiliating failure. Offices can be properly regarded only in the light of aids for the accomplishment of these objects, and as occupancy can confer no prerogative nor importunate desire for preferment any claim, the public interest imperatively demands that they be considered with sole reference to the duties to be performed. Good citizens may well claim the protection of good laws and the benign influence of good government, but a claim for office is what the people of a republic should never recognize.
#remove potentially problematic symbols
docs <-tm_map(docs,content_transformer(tolower))
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
the primary purpose of these agreements is to provide unmistakable proof of the joint determination of the free countries to resist armed attack from any quarter. every country participating in these arrangements must contribute all it can to the common defense. if we can make it sufficiently clear, in advance, that any armed attack affecting our national security would be met with overwhelming force, the armed attack might never occur.
#remove punctuation
docs <- tm_map(docs, removePunctuation)
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
again whatever action congress may take will be given a fair opportunity for trial before the people are called to pass judgment upon it and this i consider a great essential to the rightful and lasting settlement of the question in view of these considerations i shall deem it my duty as president to convene congress in extraordinary session on monday the 15th day of march 1897 in conclusion i congratulate the country upon the fraternal spirit of the people and the manifestations of good will everywhere so apparent
#Strip digits
docs <- tm_map(docs, removeNumbers)
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
i would hope that the nations of the world might say that we had built a lasting peace based not on weapons of war but on international policies which reflect our own most precious values these are not just my goalsand they will not be my accomplishmentsbut the affirmation of our nations continuing moral strength and our belief in an undiminished everexpanding american dream thank you very much
#remove stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
now receive precious inheritance indebted establishment doubly bound examples left us blessings enjoyed fruits labors transmit unimpaired succeeding generation compass thirtysix years since great national covenant instituted body laws enacted authority conformity provisions unfolded powers carried practical operation effective energies subordinate departments distributed executive functions various relations foreign affairs revenue expenditures military force union land sea
#remove whitespace
docs <- tm_map(docs, stripWhitespace)
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
however provided army enable executive suppress insurrection restore peace give security inhabitants establish authority united states throughout archipelago authorized organization native troops auxiliary regular force advised time time acts military naval officers islands action appointing civil commissions instructions charged duties powers recommendations several acts executive commission together complete general information submitted
#Stem document
docs <- tm_map(docs,stemDocument)
writeLines(as.character(docs[[sample(1:nrow(corpus.list), 1)]]))
celebr unit state america countri truli matter parti control govern whether govern control peopl
Gengerate document-term matrices.
dtm <- DocumentTermMatrix(docs)
#convert rownames to filenames#convert rownames to filenames
rownames(dtm) <- paste(corpus.list$type, corpus.list$File,
corpus.list$Term, corpus.list$sent.id, sep="_")
rowTotals <- apply(dtm , 1, sum) #Find the sum of words in each Document
dtm <- dtm[rowTotals> 0, ]
corpus.list=corpus.list[rowTotals>0, ]
Run LDA
ldaOut.topics <- as.matrix(topics(ldaOut))
table(c(1:k, ldaOut.topics))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
526 377 345 478 384 264 375 414 559 403 325 420 222 280 185
write.csv(ldaOut.topics,file=paste("../output/LDAGibbs",k,"DocsToTopics.csv"))
#top 6 terms in each topic
ldaOut.terms <- as.matrix(terms(ldaOut,20))
write.csv(ldaOut.terms,file=paste("../output/LDAGibbs",k,"TopicsToTerms.csv"))
#probabilities associated with each topic assignment
topicProbabilities <- as.data.frame(ldaOut@gamma)
write.csv(topicProbabilities,file=paste("../output/LDAGibbs",k,"TopicProbabilities.csv"))
terms.beta=ldaOut@beta
terms.beta=scale(terms.beta)
topics.terms=NULL
for(i in 1:k){
topics.terms=rbind(topics.terms, ldaOut@terms[order(terms.beta[i,], decreasing = TRUE)[1:7]])
}
topics.terms
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "man" "belief" "blood" "built" "soul" "decent" "heritag"
[2,] "sourc" "recent" "anticip" "tranquil" "endang" "dread" "dissolut"
[3,] "program" "match" "tabl" "promptitud" "ascrib" "terrif" "stride"
[4,] "home" "longer" "higher" "job" "stop" "factori" "hero"
[5,] "servic" "discharg" "presenc" "station" "candid" "undertak" "zeal"
[6,] "fall" "earn" "although" "loyal" "roman" "bosom" "cuba"
[7,] "exercis" "grant" "sovereignti" "exclus" "texa" "thirteen" "dispos"
[8,] "said" "inaugur" "solv" "vice" "told" "dont" "stood"
[9,] "role" "everyon" "central" "asid" "horizon" "refresh" "depth"
[10,] "enforc" "provis" "minor" "elector" "employe" "interst" "alarm"
[11,] "arm" "navi" "scienc" "europ" "naval" "island" "canal"
[12,] "busi" "tax" "tariff" "manufactur" "pay" "credit" "effici"
[13,] "happi" "abund" "occup" "benign" "contempt" "sway" "providenti"
[14,] "relat" "friendship" "settlement" "agenc" "neutral" "assert" "obtain"
[15,] "individu" "safe" "class" "bad" "inflict" "cross" "escap"
ldaOut.terms
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
[1,] "freedom" "great" "will" "work" "shall" "peopl" "state" "time" "world" "law"
[2,] "men" "exist" "must" "home" "duti" "upon" "govern" "now" "new" "constitut"
[3,] "hope" "much" "make" "live" "may" "spirit" "power" "year" "america" "execut"
[4,] "human" "institut" "good" "product" "offic" "civil" "unit" "first" "let" "congress"
[5,] "know" "mani" "believ" "better" "servic" "faith" "union" "day" "american" "parti"
[6,] "free" "present" "never" "great" "support" "liberti" "constitut" "presid" "togeth" "upon"
[7,] "man" "caus" "like" "find" "trust" "free" "right" "god" "old" "question"
[8,] "long" "experi" "need" "american" "confid" "republ" "general" "futur" "chang" "legisl"
[9,] "life" "danger" "thing" "way" "high" "hand" "feder" "stand" "nation" "control"
[10,] "earth" "result" "done" "see" "best" "order" "within" "histori" "generat" "elect"
[11,] "seek" "patriot" "lead" "opportun" "countri" "yet" "author" "past" "respons" "subject"
[12,] "fear" "import" "continu" "must" "administr" "govern" "limit" "today" "promis" "act"
[13,] "moral" "section" "take" "need" "call" "less" "territori" "say" "today" "made"
[14,] "liberti" "form" "understand" "help" "respons" "principl" "exercis" "moment" "democraci" "depart"
[15,] "heart" "reason" "realiz" "life" "public" "place" "whole" "fellow" "come" "given"
[16,] "mankind" "howev" "alway" "look" "oblig" "noth" "protect" "oath" "strength" "power"
[17,] "rememb" "polit" "give" "children" "faith" "hold" "observ" "word" "ideal" "enforc"
[18,] "want" "influenc" "race" "even" "expect" "rest" "grant" "come" "strong" "effect"
[19,] "light" "period" "greatest" "land" "countrymen" "love" "local" "ask" "turn" "opinion"
[20,] "carri" "action" "requir" "achiev" "purpos" "purpos" "system" "problem" "end" "instrument"
Topic 11 Topic 12 Topic 13 Topic 14 Topic 15
[1,] "war" "public" "everi" "nation" "can"
[2,] "forc" "govern" "countri" "peac" "one"
[3,] "great" "revenu" "citizen" "polici" "govern"
[4,] "made" "busi" "right" "among" "may"
[5,] "progress" "demand" "interest" "foreign" "peopl"
[6,] "increas" "industri" "equal" "relat" "without"
[7,] "defens" "necessari" "just" "justic" "individu"
[8,] "trade" "condit" "prosper" "friend" "far"
[9,] "commerc" "interest" "found" "honor" "other"
[10,] "arm" "economi" "happi" "promot" "practic"
[11,] "countri" "labor" "preserv" "intern" "anoth"
[12,] "use" "protect" "bless" "secur" "becom"
[13,] "improv" "secur" "success" "maintain" "well"
[14,] "place" "tax" "secur" "ever" "mere"
[15,] "greater" "restor" "part" "independ" "now"
[16,] "bear" "import" "polit" "respect" "use"
[17,] "maintain" "burden" "common" "war" "communiti"
[18,] "extend" "larg" "principl" "advanc" "sure"
[19,] "militari" "expenditur" "essenti" "effort" "safe"
[20,] "number" "money" "well" "great" "either"
Based on the most popular terms and the most salient terms for each topic, we assign a hashtag to each topic. This part require manual setup as the topics are likely to change.
topics.hash=c("Freedom", "great", "will", "work", "shall", "people", "state", "time", "world", "law", "war", "public", "every", "nation", "can")
corpus.list$ldatopic=as.vector(ldaOut.topics)
corpus.list$ldahash=topics.hash[ldaOut.topics]
colnames(topicProbabilities)=topics.hash
corpus.list.df=cbind(corpus.list, topicProbabilities)
orders = as.factor(as.numeric(corpus.list.df$FileOrdered)[nrow(corpus.list.df):1])
corpus.list.df<- cbind(corpus.list.df,orders)
par(mar=c(1,1,1,1))
topic.summary=tbl_df(corpus.list.df)%>%
select(orders, Freedom:can)%>%
group_by(orders)%>%
summarise_each(funs(mean))
`summarise_each()` is deprecated.
Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
To map `funs` over all variables, use `summarise_all()`
topic.summary=as.data.frame(topic.summary)
rownames(topic.summary)=topic.summary[,1]
topic.plot=c(1, 13, 9, 11, 8, 3, 7)
print(topics.hash[topic.plot])
[1] "Freedom" "every" "world" "war" "time" "will" "state"
heatmap.2(as.matrix(topic.summary[,topic.plot+1]),
scale = "column", key=F,
col = bluered(100),
cexRow = 0.9, cexCol = 0.9, margins = c(8, 14),
trace = "none", density.info = "none")
#Step 4 - Inspect an overall wordcloud
dtm.tidy=tidy(dtm)
print("GeorgeWashington")
[1] "GeorgeWashington"
rang = which(substr(dtm.tidy$document,2,nchar(dtm.tidy$document)-4)=="GeorgeWashington")
dtm.tmp=summarise(group_by(dtm.tidy[rang,], term), sum(count))
wordcloud(dtm.tmp$term, dtm.tmp$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
print("ThomasJefferson")
[1] "ThomasJefferson"
rang = which(substr(dtm.tidy$document,2,nchar(dtm.tidy$document)-4)=="ThomasJefferson")
dtm.tmp=summarise(group_by(dtm.tidy[rang,], term), sum(count))
wordcloud(dtm.tmp$term, dtm.tmp$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
print("AbrahamLincoln")
[1] "AbrahamLincoln"
rang = which(substr(dtm.tidy$document,2,nchar(dtm.tidy$document)-4)=="AbrahamLincoln")
dtm.tmp=summarise(group_by(dtm.tidy[rang,], term), sum(count))
wordcloud(dtm.tmp$term, dtm.tmp$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
print("FranklinDRoosevelt")
[1] "FranklinDRoosevelt"
rang = which(substr(dtm.tidy$document,2,nchar(dtm.tidy$document)-4)=="FranklinDRoosevelt")
dtm.tmp=summarise(group_by(dtm.tidy[rang,], term), sum(count))
wordcloud(dtm.tmp$term, dtm.tmp$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
print("BarackObama")
[1] "BarackObama"
rang = which(substr(dtm.tidy$document,2,nchar(dtm.tidy$document)-4)=="BarackObama")
dtm.tmp=summarise(group_by(dtm.tidy[rang,], term), sum(count))
wordcloud(dtm.tmp$term, dtm.tmp$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
print("DonaldJTrump")
[1] "DonaldJTrump"
rang = which(substr(dtm.tidy$document,2,nchar(dtm.tidy$document)-4)=="DonaldJTrump")
dtm.tmp=summarise(group_by(dtm.tidy[rang,], term), sum(count))
wordcloud(dtm.tmp$term, dtm.tmp$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0.3,
use.r.layout=T,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
We have analyzed the 58 inaugural speeches of presidents. And I have found some interesting things behind their speeches.
To start with, as the father of America, George Washington is a very special president. Not only because his great contribution to America, but also his style of giving speeches. Among all these 40 presidents in our study, George Washington is the most likely to say long sentences. In his inaugural speech, there are only two sentences that have less than 10 words, which are “between duty and advantage” and “From this resolution I have in no instance departed”. We could believe that after listen to the record of George Washington’s inaugural speech, latter presidents also found his sentence sounds a bit long so they tend to short their sentences when giving speeches. So it becomes a trend that as time goes by, presidents tend to use shorter sentences and less words in their inaugural speech. As we can see, shorter sentences and less words make a speech more precise and easy for audiences. And short words and sentences are more likely to cause emotional resonance, which may help them gain the trust of people.
However, George Washington and Thomas Jefferson lived in the same period but they showed quite different ways of giving speech. Unlike George Washington, Thomas Jefferson uses shorter sentences compared to the presidents in his near furture. Let’s recall from their teenager environment. George Washington lived in the countryside and had never got education until 15 years old. Then he learned from the local tutor and showed talent in math, geometry and measurement. Look, his early experience explained everything. George Washington was good at solving complicated problems. Mathematicians never fear to difficult problems and concepts and can easily undertand long sentences. As for Thomas Jefferson, he recieved classical education and learned history and politics in his early period, which may made him a good speaker at his period. He is the author of the Declaration of Independence, which is for every American people. So that long sentences might cause confuse. Thomas Jefferson became aware of it therefore his speeches were more short and pithy. For now it almost becomes a trend that the inaugural speech uses short sentences. Maybe it is because modern people are too tired to listen to long sentences.
Another interesting discover is that presidents’ words always reflect their time age and people’s hope. For the first two presidents, we can see from the word cloud that they focused more on equality of every people. They lived in a period that America was just founded. It became very important that every individual’s right was treated equally. While when it came to the time of Abraham Lincoln who lead the civil war and Franklin D. Roosevelt who joined the second world war. Their speeches were strongly related to the topic of war and peace. We can find words about union or states in Abraham Lincoln’s speech while peace in Franklin D. Roosevelt’s speeches. Now let’s think about Barack Obama and Donald Trump. They both have the willing to recover from 2008 economic crisis, so they talked more about past, together and America, which encouraged people to fight for the hard days together.
Last point to mention, from the clustering of emotions, we also found that the positive emotions is the most used feeling in inaugural speech. It is not hard to explain that only if the speaker showes trustful quality to audiences, the audiences will trust you as a good president. What’s more, we also can see some relationship between emotion clustering and party labels. The ones share the same idea will show smilar emotions when giving speeches.
Let’s use the most used expression in inaugural speech to end our study, Thank you!